Rule-based Text Normalization for Malay Social Media Texts
نویسندگان
چکیده
منابع مشابه
Rule-Based Normalization of Historical Texts
This paper deals with normalization of language data from Early New High German. We describe an unsupervised, rulebased approach which maps historical wordforms to modern wordforms. Rules are specified in the form of context-aware rewrite rules that apply to sequences of characters. They are derived from two aligned versions of the Luther bible and weighted according to their frequency. The eva...
متن کاملA Cascaded Approach for Social Media Text Normalization of Turkish
Text normalization is an indispensable stage for natural language processing of social media data with available NLP tools. We divide the normalization problem into 7 categories, namely; letter case transformation, replacement rules & lexicon lookup, proper noun detection, deasciification, vowel restoration, accent normalization and spelling correction. We propose a cascaded approach where each...
متن کاملVADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text
The inherent nature of social media content poses serious challenges to practical applications of sentiment analysis. We present VADER, a simple rule-based model for general sentiment analysis, and compare its effectiveness to eleven typical state-of-practice benchmarks including LIWC, ANEW, the General Inquirer, SentiWordNet, and machine learning oriented techniques relying on Naive Bayes, Max...
متن کاملA Beam-Search Decoder for Normalization of Social Media Text with Application to Machine Translation
Social media texts are written in an informal style, which hinders other natural language processing (NLP) applications such as machine translation. Text normalization is thus important for processing of social media text. Previous work mostly focused on normalizing words by replacing an informal word with its formal form. In this paper, to further improve other downstream NLP applications, we ...
متن کاملAn architecture for Malay Tweet normalization
Research in natural language processing has increasingly focused on normalizing Twitter messages. Currently, while different well-defined approaches have been proposed for the English language, the problem remains far from being solved for other languages, such as Malay. Thus, in this paper, we propose an approach to normalize the Malay Twitter messages based on corpus-driven analysis. An archi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Advanced Computer Science and Applications
سال: 2020
ISSN: 2156-5570,2158-107X
DOI: 10.14569/ijacsa.2020.0111021